library(tidyverse)
library(ggplot2)
library(stringr)
library(tidytext)
library(plotly)
library(wordcloud)
library(wordcloud2)
library(reshape2)
library(babynames)

Keystone Graphic

knitr::include_graphics("keystonegraphic.png")

The keystone graphic is a word cloud for July 2001, in the middle of the California energy crisis and shortly before Skilling resigns as CEO. In our investigation, this is a time of extreme emotional expression the Enron email dataset, and the word cloud reveals both the overall sentiment (negative) at the time, as well as the subject matter. We will explore this graphic later in our analysis.

Introduction

The energy, commodities, and services company named Enron Corporation was one of the largest auditing and accounting companies in the world prior to its demise in the early 2000s (the company officially filed for bankruptcy on December 2, 2001). According to Encyclopedia Britannica, the “collapse of Enron, which held more than $60 billion in assets, involved one of the biggest bankruptcy filings in the history of the United States, and it generated much debate as well as legislation designed to improve accounting standards and practices, with long-lasting repercussions in the financial world.”

The scandal, which encompassed all of the events leading up to its bankrupcy, made headlines all over the world. During the 2002 investigation of Enron, the US Federal Energy Regulatory Commission (FERC) checked emails sent and received by the company’s employees. During the investigation, the FERC posted data regarding the emails on the web, making it public.

According to William Cohen at Carnegie Mellon University, this dataset was “collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes)” and “contains data from about 150 users, mostly senior management of Enron, organized into folders.” Overall, the datasets contains a total of about 0.5M messages. The dataset is now available on Kaggle. According to Kaggle, “the Enron email dataset contains approximately 500,000 emails generated by employees of the Enron Corporation.”

For our analysis, we first read in the Enron email dataset, which we first downloaded as a .csv file from Kaggle. The dataset contains two variables: file (which contains the original directory and filename of each email) and message (which contains the email text).

We seek to explore the following question: Is there evidence for an increase in negative sentiment in emails sent by Enron employees during the onset on the Enron scandal?

In addition, we seek to learn how to use sentiment analysis in conjunction with other data science techniques. This exploration treats data science as much of an art as a science. We lean on time-series graphs, radar charts, and word clouds to answer our question.

Methodology

This dataset is messy, and we need to first clean it, creating variables that include the email body, header (subject line), and date. In an attempt to remove spam emails from the dataset and make the dataset more manageable, we filtered out forwarded messages.

enron <- read_csv("/Users/mauralonergan/Downloads/emails.csv")

enron_mostly_clean <- enron %>%
  mutate(message = message %>% str_replace_all('\r', '')) %>%
  mutate(header = str_sub(message, end = str_locate(message, '\n\n')[,1] -1)) %>%
  mutate(body = str_sub(message, start = str_locate(message, '\n\n')[,2] + 1) %>% 
           str_replace_all('\n|\t', ' ') %>% 
           str_replace_all('---Original Message .*', 'FORWARDED_MESSAGE') %>% 
           str_replace_all('--- Forwarded by .*', 'FORWARDED_MESSAGE') %>%
           str_replace_all('From: .*', 'FORWARDED_MESSAGE') %>%
           str_replace('To:.*', 'FORWARDED_MESSAGE') %>%
           str_replace_all('\\S*@\\S*', 'EMAIL_ADDRESS')) 
enron_clean <- enron_mostly_clean %>%
  mutate(date = str_extract(header, 'Date:.*') %>% 
           str_replace('Date: ', '') %>% 
           str_replace('.+, ', '') %>% 
           strptime(format = '%d %b %Y %H:%M:%S %z') %>%
           as.POSIXct()) %>%
  mutate(from = str_extract(header, 'From:.*') %>% 
           str_replace('From: ', '')) %>%
  mutate(to = header %>% str_replace_all('\n|\t', ' ') %>%
           str_extract('To:.*Subject:') %>%
           str_replace_all('To: |Subject:', '')) %>%
  mutate(subject = str_extract(header, 'Subject:.*') %>% 
           str_replace('Subject: ', '')) %>%
  mutate(xfrom = str_extract(header, 'X-From:.*') %>% 
           str_replace('X-From: ', '')) %>%
  mutate(xto = str_extract(header, 'X-To:.*') %>% 
           str_replace('X-To: ', '')) %>%
  mutate(xcc = str_extract(header, 'X-cc:.*') %>% 
           str_replace('X-cc: ', '')) %>%
  mutate(xbcc = str_extract(message, 'X-bcc:.*') %>% 
           str_replace('X-bcc: ', '')) 

enron_no_forward<- enron_clean%>%
  filter(body !="-------------------FORWARDED_MESSAGE")

To focus on non-spam messages, we also filter by messages that were strictly sent by Enron employees. We also create a month and year variable for the date a given message was sent, which we will use later.

sent_emails<-enron_no_forward%>%
  filter(str_detect(enron_no_forward$file,"sent"))

sent_date <- sent_emails %>%
  mutate(month = format(date, "%m")) %>%
  mutate(year = format(date,"%Y")) 

We then create a sentiment score for each message. We unnest tokens, separating each message, and use Bing’s sentiment lexicons to assign an overall sentiment score by subtracting the negative score a message receives from a positive score.

#Bing
sent_bing<-sent_date %>%
  unnest_tokens(word, body)%>%
  inner_join(get_sentiments("bing"))%>%
  count(message, sentiment) %>%
  spread(sentiment, n, fill = 0 )%>%
  mutate(total_sentiment = (positive - negative))


sent_joined<- left_join(sent_date,
                         sent_bing,
                         by = "message")

#Afinn

sent_afinn<-sent_date %>%
  unnest_tokens(word, body)%>%
  inner_join(get_sentiments("afinn"))%>%
  group_by(message)%>%
  summarise(afinn_score = sum(score))

sent_a_b<- left_join(sent_joined,
                      sent_afinn,
                      by = "message")


#NRC
sent_nrc <- sent_date%>%
  unnest_tokens(word, body)%>%
  inner_join(get_sentiments("nrc"))%>%
  count(message, sentiment) %>%
  spread(sentiment, n, fill = 0 )

sent_emotion <- left_join(sent_a_b,
                           sent_nrc,
                           by = "message")

sent_emotion[is.na(sent_emotion)]<- 0

After cleaning the data, we begin our investigation with a tally of sent emails over the years spanned in the dataset. Notice that there are far fewer emails sent before 2000 than after. This motivates us to filter the data when tracking sentiment over time in later plots. Another interesting feature of the number of emails sent over time is the significant decrease in email volume in the summer of 2001. This is the same time period as the CEO, Skilling, suddenly resigned and the California Energy Crisis was reaching its peak. We will later see that this time is also associated with a spike in negative sentiments.

sent_emotion%>%
  group_by(year,month)%>%
  summarise(avg = n())%>%
  mutate(date = as.Date((paste(year,month,15,
                               sep = "/")))) %>%
  ggplot(aes(x = date,y= avg))+
  geom_line() + labs(title = "Average Sentiment Score over Time", x = "Date", y = "Average AFINN Sentiment Score") + theme_minimal() + theme(plot.title = element_text(hjust = 0.5))  

Before diving in to sentiment analysis, we wanted to be sure that the lexicons we use are comparable and tracking the correct sentiments in emails. To test agreement between the Bing and Afinn lexicons, we each of our messages along their total sentiment scores in both lexicons.

From this plot, there appears to be a strong correlation of sentiment between the Bing and AFINN sentiment lexicons, as points fall largely in the blue regions that mark overall agreement.

sent_emotion %>%
  filter(total_sentiment<50)%>%
  filter(total_sentiment>(-50))%>%
  filter(afinn_score<100)%>%
  filter(afinn_score>(-100))%>%
  ggplot(aes(x = total_sentiment, y = afinn_score))+
  annotate("rect", xmin = Inf, xmax = 0, ymin = Inf, ymax = 0, fill= "blue",alpha = 0.3)  + 
  annotate("rect", xmin = -Inf, xmax = 0, ymin = -Inf, ymax = 0 , fill= "blue",alpha = 0.3) + 
  annotate("rect", xmin = 0, xmax = Inf, ymin = 0, ymax = -Inf, fill= "red",alpha = 0.3) + 
  annotate("rect", xmin = 0, xmax = -Inf, ymin = Inf, ymax = 0, fill= "red",alpha = 0.3)+
  geom_point() + labs(title = "AFINN vs. Bing Sentiment Scores", x = "Bing Sentiment Score", y = "AFINN Sentiment Score") + theme(plot.title = element_text(hjust = 0.5)) 

We also measure specific emotions using the NRC sentiment lexicon. Grouping our data by month and year, we can find the average sentiment in each of the NRC emotions each month, and plot these emotions over time. We also normalised these emotion scores so that the magnitudes of sentiments shifts can be easily compared, as some emotions were consistently more common.

There is also code below to prepare a radar diagram of emotion distributions at two different dates, which will be shown in the Results section.

radar_data <- sent_emotion%>%
  group_by(year,month)%>%
  summarise(anticipation = mean(anticipation),
            trust = mean(trust),
            fear = mean(fear),
            anger = mean(anger),
            joy = mean(joy),
            disgust = mean(disgust),
            negative = mean(negative.y),
            positive = mean(positive.y),
            sadness = mean(sadness),
            surprise = mean(surprise))%>%
  mutate(norm_anticipation = anticipation/mean(anticipation),
         norm_trust = trust/mean(trust),
         norm_fear = fear/mean(fear),
         norm_anger = anger/mean(anger),
         norm_joy = joy/mean(joy),
         norm_disgust = disgust/mean(disgust),
         norm_negative = negative/mean(negative),
         norm_positive = positive/mean(positive),
         norm_sadness = sadness/mean(sadness),
         norm_surprise = surprise/mean(surprise))

radar1<- t(radar_data)


radar2000 <- radar1[c(13:22),20]
radar2001 <- radar1[c(13:22),34]
  
radar_2000_data<-data.frame(labels = colnames(radar_data)[c(13:22)],
           values = as.numeric(radar2000))

radar_2001_data<-data.frame(labels = colnames(radar_data)[c(13:22)],
                            values = as.numeric(radar2001))

range01 <- function(x){(x-min(x))/(max(x)-min(x))}

As a preliminary investigation into specific sentiments over time, we grouped possitive (Blue) and negative (Red) emotions and plotted them below. Again, we filtered the data to dates after 2000 due to small sample sizes at those earlier dates.

Notice that there are two clear peaks in emotion overall. These are the summer of 2001 when the value of the company was being questioned and senior staff were leaving, as well as in early 2000, when the company’s stock hit its peak.

It is interesting to note that both positive and negative NRC sentiments rose at both of these times, suggesting that either the models are not perfect as differentiating types of emotion, or that during both positive and negative times, emails conveyed more emotion of all kinds overall. One reason for the later explaination could be that the emails were more personal. It is known that the Enron management sent confident memos to staff during the company’s downfall, for example.

radar_data%>%
  filter(year>=2000)%>%
  mutate(date = as.Date((paste(year,month,01,
                               sep = "/"))))%>%
  ggplot(aes(x = date, 
             y =(norm_fear+norm_anger+norm_disgust+norm_sadness+norm_negative)/5))+
  geom_line(color = "Red")+
  geom_line(aes(x = date, y = (norm_joy+norm_trust+norm_positive)/3), color = "Blue")+
  theme_bw()+
  labs(title = "Grouped Positive and Negative NRC Sentiments", 
       x = "Date",
       y = "Normalised NRC Sentiment")

Results

To test our research question if sentiment in Enron’s emails fell as the company approached lawsuits and bankcruptcy, we tracked average score over time using the Afinn lexicon. We chose the Afinn lexicon because it weighs words beyond simply “positive” and “negative”, but is also one dimensional. These two characteristics make it more sensitive to extreme language and a convinient was to track overall sentiment, leaving the breakdown by emotion to later figures. This plot spans 2000 to through 2002.

Important dates to put this graph in context are August 2000, when Enron stock reached its peak price per share at over $90, and August 2001 when Skilling suddenly resigned in the midst of the California Energy Crisis, which was later linked to Enron.

sent_emotion%>%
  filter(year>=2000) %>%
  group_by(year,month)%>%
  summarise(emotion = sum(afinn_score),
            avg = mean(afinn_score, na.rm = T))%>%
  mutate(date = as.Date((paste(year,month,15,
                               sep = "/")))) %>%
  ggplot(aes(x = date,y= avg))+
  geom_line()+
  geom_smooth(aes(x = date,y = avg), color = "Blue")  + labs(title = "Average AFINN Sentiment Score over Time", x = "Date", y = "Average AFINN Sentiment Score") + theme_minimal() + theme(plot.title = element_text(hjust = 0.5)) 
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

We see that there is a peak in overall sentiment in the summer of 2000, when Enron’s stock price peaked and the company was named “Most Innovative Company” for the third time. This is followed by a consistent fall in sentiment to a local minimum in the summer of 2001, and a further decline as Enron declares bankcrupcy and legal investigations begin in 2002.

Investigating NRC sentiments at the two times discussed, summer 2000 and summer 2001, we can see the shifting of individual sentiments between positive and negative times. This radar diagram shows average monthly sentiment scores, standardised between 0 and 1 for both dates, as the earlier NRC line graph showed the overall level of emotion in emails varied dramatically. With this standardisation, vales closer to 1 are more common. We believe the standardising of our values makes for better comparison over time, as it allows us to see which are more common, contextualised with overall emotion.

plot_ly(
  type = 'scatterpolar',
  fill = 'toself'
) %>%
  add_trace(
    r = range01(radar_2000_data$values),
    theta = c("Anticipation","Trust", "Fear","Anger",
              "Joy","Disgust","Negative","Positive",
              "Sadness","Surprise"),
    name = '2000',
    color = I("midnightblue")
  ) %>%
  add_trace(
    r = range01(radar_2001_data$values),
    theta = c("Anticipation","Trust", "Fear","Anger",
              "Joy","Disgust","Negative","Positive",
              "Sadness","Surprise"),
    name = '2001',
    color = I("firebrick2")
  ) %>%
  layout(
    polar = list(
      radialaxis = list(
        visible = T,
        range = c(0,1)
      )
    )
  )
AnticipationTrustFearAngerJoyDisgustNegativePositiveSadnessSurprise00.20.40.60.81
20002001

Notice the changing vertex positions between the two years. In 2000, at Enron’s peak, the emotions Trust, Fear, Joy, Anticipation and general positive sentiment were higher than a year later. Though Fear is not a positive emotion, we believe it is linked to their converations about stock prices, as the summer of 2000 was slightly after the peak. Despite this, however, this diagram shows a clear increase in negative sentiments in 2001, with the most common emotions becoming Anger, Disgust, Sadness and general negative sentiments.

Before creating the word clouds, we ensure that numbers, punctuation, and names are not considered stop words.

babynames <- babynames %>%
  mutate(name = str_to_lower(name))

`%notin%` <- Negate(`%in%`) #function for the opposite of %in%

enron_body <- sent_emotion %>% 
  unnest_tokens(word, body) %>%
  count(word,
        sort = TRUE) %>%
  anti_join(stop_words) %>%
  filter((str_detect(word, "[:digit:]"))== F) %>% #get rid of numbers
  filter((str_detect(word, "[:punct:]")) == F) %>% # get rid of punctuation
  filter(word %notin% str_to_lower(babynames$name)) # get rid of names
  

enron_forcloud <- enron_body %>%
  inner_join(get_sentiments("nrc")) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("tomato3", "pink", "darkorange4", "red", "green", "royalblue4", "chartreuse3", "slategray4", "orange", "seagreen3"),
                   max.words = 250, title.size = 1.5)

We first create a word cloud that encompasses words from emails that came after 1979. Overall, we see evidence of bankruptcy-related concern. We see words like “bankruptcy”, “risk”, and “crisis”.

These words are notably larger in the word cloud which focuses on the time right before the then-CEO Jeffrey Skilling resigned.

We create a word cloud that focuses on July 2001, the month before Skilling’s resignation. At the time, a number of negative words related to bankruptcy and the unfolding of the scandal. Certain words dominate the word cloud, such as “bankruptcy”, “failure”, “regulatory”, “deal”, “crisis”, and “debt”. Of course, given the nature of the scandal, “electricity” (what Enron specialized in) and “money” (the root of the scandal) are among the most common words in emails. The word “jail” even appears.

enron_body07.2001 <- sent_emotion %>% 
  filter(year == 2001)%>%
  filter(month == "07")%>%
  unnest_tokens(word, body) %>%
  count(word,
        sort = TRUE) %>%
  anti_join(stop_words) %>%
  filter((str_detect(word, "[:digit:]"))== F) %>% #get rid of numbers
  filter((str_detect(word, "[:punct:]")) == F) %>% # get rid of punctuation
  filter(word %notin% str_to_lower(babynames$name)) # get rid of names



enron_cloud <- enron_body07.2001 %>%
  inner_join(get_sentiments("nrc")) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("tomato3", "pink", "darkorange4", "red", "green", "royalblue4", "chartreuse3", "slategray4", "orange", "seagreen3"), 
                   max.words = 250, title.size = 1.5)

Note that this graphic may appear slightly different in shape from the keystone graphic included above. When we knit in R, the graphic is generated again. The graphic that we used above came from the word cloud that we downloaded after running the code in the console.

In the “joy” category, there are some words that may be negative in the context of the emails. The frequency of “resources” may reflect some concern of the employees about gathering the necessary resources/materials to grapple with the scandal.

We also create a word for December 2001, when the company files for bankruptcy. The second biggest word (after “wireless” - which we are still working on understanding) is understandably “bankruptcy”. Other common words include “termination” and “change,” both of which are reflective of a sensitive time of transition for the company. Under the “surprise” category, the word “leave” appears large. While characterized as a surprise word, this could also probably be put under the ‘negative’ or ‘disgust’ categories.

enron_body12.2001 <- sent_emotion %>% 
  filter(year == 2001)%>%
  filter(month == "12")%>%
  unnest_tokens(word, body) %>%
  count(word,
        sort = TRUE) %>%
  anti_join(stop_words) %>%
  filter((str_detect(word, "[:digit:]"))== F) %>% #get rid of numbers
  filter((str_detect(word, "[:punct:]")) == F) %>% # get rid of punctuation
  filter(word %notin% str_to_lower(babynames$name)) # get rid of names

enron_cloud <- enron_body12.2001 %>%
  inner_join(get_sentiments("nrc")) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("tomato3", "pink", "darkorange4", "red", "green", "royalblue4", "chartreuse3", "slategray4", "orange", "seagreen3"), 
                   max.words = 250, title.size = 1.5)

Viewing the three word clouds (the all-inclusive word clouds and the ones that zero in July 2001 and December 2001) in conjunction with one another helps us view what kind of matters were pressing at Enron over different time frames. The word cloud which includes the larger time frame shows evidence that there was certainly negative sentiment in the aggregate (“change,” “risk,” and “bad” are among the largest words) but such sentiment may be harder to view given the frequency of less emotionally charged, more business-oriented words like “presto,” “vacation,” and “time.”

Conclusion

Overall, we find evidence for an increase in negative sentiment during tumultous moments of the Enron scandal, both through general decrease in sentiment from their stock price peak to their bankcruptcy, and the changes in most prevalent emotions at specific moments. The overall decrease in sentiment can be see in the line graph of overall sentiment over time, while investigation into specific emotions and subjects are seen in the radar diagram and word clouds.

From a learning perspective, we are pleased with that our results were able to draw a conclusion from the sentiment analysis of a real dataset, and with the methods available in R to present such results.

Even now, we have a few remaining questions about our results and the data that a future project might be able to answer. For example, why is there a significant dip in the number of emails sent in 2001? The shape of our curve, with a steady increase in emails sent in the early years, suggests that there might be some emails redacted from this dataset. We are curious how this effects our results, given the fall in emails corresponds to an emotional time for Enron.

Furthermore, we would be interested in investigating the dataset on a more granular level, mapping real world events to emotional responses on a smaller level than by month.